Microbiome analysis with QIIME2

…BRIEF INTRO IN PROGRESS…


A tentative snakemake workflow that defines qiime2 bioinformatics rules in a DAG (directed acyclic graph) format. A detailed interactive snakemake HTML report is available here. You will be able to explore the workflow and the associated statistics. You can close the left bar (overlap) to get a more expansive display view.


Getting started with QIIME 2 pipeline

Get QIIME2 YAML file

wget https://data.qiime2.org/distro/core/qiime2-2023.2-py38-osx-conda.yml

Create qiime2 env and install qiime2

Current YAML file: qiime2-2023.2-py38-osx-conda.yml available

The qiime2 YAML file contains over 500 dependencies. Listed below is just a few QIIME 2 framework dependencies to get the installation started.

name: qiime2202320
channels:
    - qiime2/label/r2023.2
    - conda-forge
    - bioconda
    - defaults
dependencies:
    - q2cli=2023.2.0
    - qiime2=2023.2.0
    - python=3.8.16
    - q2-alignment=2023.2.0
    - q2-composition=2023.2.0
    - q2-cutadapt=2023.2.0
    - q2-dada2=2023.2.0
    - q2-deblur=2023.2.0
    - q2-demux=2023.2.0
    - q2-diversity=2023.2.0
    - q2-diversity-lib=2023.2.0
    - q2-emperor=2023.2.0
    - q2-feature-classifier=2023.2.0
    - q2-feature-table=2023.2.0
    - q2-fragment-insertion=2023.2.0
    - q2-gneiss=2023.2.0
    - q2-longitudinal=2023.2.0
    - q2-metadata=2023.2.0
    - q2-mystery-stew=2023.2.0
    - q2-phylogeny=2023.2.0
    - q2-quality-control=2023.2.0
    - q2-quality-filter=2023.2.0
    - q2-sample-classifier=2023.2.0
    - q2-taxa=2023.2.0
    - q2-types=2023.2.0
    - q2-vsearch=2023.2.0

Installing QIIME2 using a bash script

conda activate base
wget https://data.qiime2.org/distro/core/qiime2-2023.2-py38-osx-conda.yml
conda env create -n qiime2-2023.2 --file qiime2-2023.2-py38-osx-conda.yml
conda activate qiime2-2023.2 
qiime info

Downloading demo data

Demo data from one of QIIME 2[1] tutorials.

mkdir -p resources
mkdir -p resources/reads
mkdir -p resources/references

cd mkdir -p resources/reads

wget \
  -O "sample-metadata.tsv" \
  "https://data.qiime2.org/2023.2/tutorials/atacama-soils/sample_metadata.tsv"

wget \
  -O "emp-paired-end-sequences/forward.fastq.gz" \
  "https://data.qiime2.org/2023.2/tutorials/atacama-soils/10p/forward.fastq.gz"

wget \
  -O "emp-paired-end-sequences/reverse.fastq.gz" \
  "https://data.qiime2.org/2023.2/tutorials/atacama-soils/10p/reverse.fastq.gz"

wget \
  -O "emp-paired-end-sequences/barcodes.fastq.gz" \
  "https://data.qiime2.org/2023.2/tutorials/atacama-soils/10p/barcodes.fastq.gz"

Download a QIIME 2 trained classifer

wget \
  -O "gg-13-8-99-515-806-nb-classifier.qza" \
  "https://data.qiime2.org/2023.2/common/gg-13-8-99-515-806-nb-classifier.qza"

Other classifiers also exist. Check on QIIME2 website for more information.


Classification methods

1. De novo clustering

Sequences are clustered against one another.

Closed-reference clustering

Here the clustering is performed at 99% identity against the Greengenes reference database.

Open-reference clustering

Here the clustering is performed at 99% identity against the Greengenes reference database.


Alignment of representative sequences

  • The MAFFT (Multiple Alignment using Fast Fourier Transform) software provides alignments of the representative sequences.
  • Then we will run alignment mask function to remove poor alignments.


Quality control and feature table with DADA2

QIIME2 uses DADA2[2] tool for:

  • Detecting poor reads in Illumina amplicon sequence data.
  • Denoising.
  • Filtering chimeric sequences.
  • Filtering any phiX reads present in marker gene.
  • Construction of feature table.



Citation

Please consider citing the iMAP article[3] if you find any part of the IMAP practical user guides helpful in your microbiome data analysis.


References

[1]
Bolyen, E., Rideout, J. R., Dillon, M. R., Bokulich, N. A., Abnet, C. C., Al-Ghalith, G. A., … Caporaso, J. G. (2019). Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology, 37(8), 852–857. https://doi.org/10.1038/s41587-019-0209-9
[2]
Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J. A., & Holmes, S. P. (2016). DADA2: High-resolution sample inference from illumina amplicon data. Nature Methods, 13(7), 581. https://doi.org/10.1038/nmeth.3869
[3]
Buza, T. M., Tonui, T., Stomeo, F., Tiambo, C., Katani, R., Schilling, M., … Kapur, V. (2019). iMAP: An integrated bioinformatics and visualization pipeline for microbiome data analysis. BMC Bioinformatics, 20. https://doi.org/10.1186/S12859-019-2965-4



Appendix

Project main tree

.
├── LICENSE.md
├── README.md
├── config
│   ├── config.yml
│   ├── pbs
│   ├── samples.tsv
│   ├── slurm
│   └── units.tsv
├── dags
│   ├── rulegraph.png
│   └── rulegraph.svg
├── data
│   ├── README.md
│   ├── logs
│   ├── metadata
│   ├── mothur
│   ├── qiime
│   ├── reads
│   ├── references
│   └── test
├── images
│   ├── 16srrna.png
│   ├── bioinformatics.png
│   ├── bkgd.png
│   └── imap_part05.svg
├── index.Rmd
├── library
│   ├── apa.csl
│   ├── export.bib
│   ├── imap.bib
│   └── references.bib
├── qiime2_process
│   ├── aligned-rep-seqs.qza
│   ├── demux.qza
│   ├── demux.qzv
│   ├── feature-table-dn-99.qza
│   ├── feature-table.qza
│   ├── feature-table.qzv
│   ├── masked-aligned-rep-seqs.qza
│   ├── masked-aligned-rep-seqs.qzv
│   ├── new-ref-seqs-or-85.qza
│   ├── rep-seqs-cr-85.qza
│   ├── rep-seqs-dn-99.qza
│   ├── rep-seqs-or-85.qza
│   ├── rep-seqs.qza
│   ├── rep-seqs.qzv
│   ├── rooted-tree.qza
│   ├── sample-metadata.qzv
│   ├── stats.qza
│   ├── stats.qzv
│   ├── table-cr-85.qza
│   ├── table-or-85.qza
│   ├── taxa-bar-plots.qzv
│   ├── taxonomy.qza
│   ├── taxonomy.qzv
│   ├── unmatched-cr-85.qza
│   └── unrooted-tree.qza
├── report.html
├── resources
│   ├── 85_otus.qza
│   ├── final_fasta
│   ├── gg-13-8-99-515-806-nb-classifier.qza
│   ├── metadata
│   ├── reads
│   └── test
├── results
│   └── project_tree.txt
├── smk.css
├── styles.css
├── tree.sh
└── workflow
    ├── Snakefile
    ├── envs
    ├── report
    ├── rules
    └── scripts

27 directories, 50 files

Troubleshooting of FAQs

  1. Question
    • Answer
  2. Question
    • Answer